Unique permutation hashing

نویسندگان

  • Shlomi Dolev
  • Limor Lahiani
  • Yinnon A. Haviv
چکیده

We propose a new hash function, the unique-permutation hash function, and a performance analysis of its hash computation. Following the notations of [15], the cost of a hash function h is denoted by Ch(k,N) and stands for the expected number of table entries that are checked when inserting the (k + 1) key to a hash table of size N , where k out of N table entries are filled by previous insertions. A hash function maps a key to a permutation of the table locations. A hash function h is simple uniform if items are equally likely to be hashed to any table location (in the first trial). A hash function h is random or strong uniform if the probability of any permutation to be a probe sequence, when using h, is 1 N ! , where N is the size of the table. According to [15], the cost of a random hash function, denoted by C0(k,N) = 1+k/(N−k+1), is a “lower bound” on hashing performance. A lower bound in a sense that if some hash function h′ has a cost Ch′ < C0(k,N), for some k and N , then there exists k′ < k such that Ch′(k, N) > C0(k, N). We show that the unique-permutation hash function is not only a simple uniform hash function but also a random hash function, i.e., strong uniform and therefore has the cost of C0(k,N). Namely, each probe sequence is equally likely to be chosen, when the keys are uniformly chosen. Our hash function ensures that each empty table location has the same probability to be assigned with a uniformly chosen key. We also show that the expected time for computing the unique-permutation hash function is O(1) and the expected number of table locations that are checked before an empty location is found, during insertion (or search), is also O(1) for constant load factors α < 1, where α is the ratio between the number of inserted items and the table size. The unique-permutation hash function has an application for parallel, distributed and multi-core systems that avoid contention as much as possible. ∗Department of Computer Science, Ben-Gurion University, Beer-Sheva, 84105, Israel. Partially supported by IBM faculty award, NSF grant and the Israeli ministry of defense. Email: {dolev,lahiani,haviv}@cs.bgu.ac.il.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

One Permutation Hashing

Abstract Minwise hashing is a standard procedure in the context of search, for efficiently estimating set similarities in massive binary data such as text. Recently, b-bit minwise hashing has been applied to large-scale learning and sublinear time nearneighbor search. The major drawback of minwise hashing is the expensive preprocessing, as the method requires applying (e.g.,) k = 200 to 500 per...

متن کامل

Sufficient conditions for sound hashing using a truncated permutation

In this paper we give a generic security proof for hashing modes that make use of an underlying fixed-length permutation. We formulate a set of five simple conditions, which are easy to implement and to verify, for such a hashing mode to be sound. These hashing modes include tree hashing modes and sequential hashing modes. We provide a proof that for any hashing mode satisfying the five conditi...

متن کامل

Densifying One Permutation Hashing via Rotation for Fast Near Neighbor Search

The query complexity of locality sensitive hashing (LSH) based similarity search is dominated by the number of hash evaluations, and this number grows with the data size (Indyk & Motwani, 1998). In industrial applications such as search where the data are often high-dimensional and binary (e.g., text n-grams), minwise hashing is widely adopted, which requires applying a large number of permutat...

متن کامل

Improved Densification of One Permutation Hashing

The existing work on densification of one permutation hashing [24] reduces the query processing cost of the (K,L)-parameterized Locality Sensitive Hashing (LSH) algorithm with minwise hashing, from O(dKL) to merely O(d + KL), where d is the number of nonzeros of the data vector, K is the number of hashes in each hash table, and L is the number of hash tables. While that is a substantial improve...

متن کامل

Permutation grouping: intelligent Hash function design for audio & image retrieval

The combination of MinHash-based signatures and LocalitySensitive Hashing (LSH) schemes has been effectively used for finding approximate matches in very large audio and image retrieval systems. In this study, we introduce the idea of permutation-grouping to intelligently design the hash functions that are used to index the LSH tables. This helps to overcome the inefficiencies introduced by has...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Theor. Comput. Sci.

دوره 475  شماره 

صفحات  -

تاریخ انتشار 2013